-
Notifications
You must be signed in to change notification settings - Fork 676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Feature] Add_Data_IO_Functions #281
Conversation
Hi @lightaime , I have finished the first version data I/O module, for the linting test, I used # type: ignore to pass the test (see my last two commits) since I don't know how can I fix it in another way. Please feel free to review my code and give me suggestions or comments, thank you~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is good overall. You bring in some binaries for the purpose of testing. They are smaller than 1 MB and it is fine. However test/data_samples/test_hello_multi.pdf is quite big, 300kb. Let it be. Also I am not sure what to do with this poetry lock. Is it possible to revert all hunks except for the newly added packages?
camel/functions/data_io_functions.py
Outdated
from io import BytesIO | ||
from typing import Any, Dict, List, Optional | ||
|
||
import docx2txt # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove # type: ignore
, instead add them to mypy exceptions in pyproject.toml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Dmitrii, thank you for your review, I have made the adjustment as you suggested and I used poetry lock to update the poetry.lock file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useful feature, Thanks! Just leave some comments of docs and code style. Committing poetry.lock is necessary since this PR update dependencies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Wendong. Thank you for your high quality code. Please feel free to merge it if no more comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good. Approved.
The docstring format of this PR is not consistent with others. We may need to fix the format. |
💡 User Story:
As an AI engineer,
I want to add data input capability to CAMEL
So that our user can take different kinds of files as input
🛒 Types of Changes:
🚀 Acceptance Criteria:
Functional code can handle with:
.docx
.pdf
.txt
.html
.json
Convert information from all these file types into string
Apply the functional code to CAMEL master branch successfully, and pass all tests.
🔀 Related User Stories:
No story related
🛠️ Technical Details:
Use BytesIO, reference Langchain data input module
📑 Tasks:
.docx
.pdf
.txt
.html
.json
🔑 Dependencies:
No dependencies needed
🤝 Required Prerequisites: